An Efficient Approach to Identifying and Validating Clusters in Multivariate Datasets with Applications in Gene Expression Analysis
نویسندگان
چکیده
Gene expression data analysis has become an important topic in bioinformatics due to its wide application in the biomedical industry. Effective analysis of gene expression data is an essential part of various data mining methods, especially the clustering techniques. Various kinds of clustering methods have been proposed, yet they do not satisfy for the requirements of high efficiency, high quality and automation in the mining of gene expression data. In this paper, we propose an efficient and automatic clustering approach that is suitable for gene expression analysis. The proposed approach primarily employs similarity-matrix based clustering techniques, complemented by new heuristics for reducing the computation cost. In particular, a novel validation technique is incorporated for evaluating the quality of the discovered gene expression patterns. Because it includes empirical evaluation of different gene expression datum, the proposed approach is able perform better than other methods in terms of efficiency, clustering quality and automation.
منابع مشابه
A Clustering Approach by SSPCO Optimization Algorithm Based on Chaotic Initial Population
Assigning a set of objects to groups such that objects in one group or cluster are more similar to each other than the other clusters’ objects is the main task of clustering analysis. SSPCO optimization algorithm is anew optimization algorithm that is inspired by the behavior of a type of bird called see-see partridge. One of the things that smart algorithms are applied to solve is the problem ...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملAn Improved SSPCO Optimization Algorithm for Solve of the Clustering Problem
Swarm Intelligence (SI) is an innovative artificial intelligence technique for solving complex optimization problems. Data clustering is the process of grouping data into a number of clusters. The goal of data clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to data from other clusters. Clustering algorithms have been applied to a ...
متن کاملAn Improved SSPCO Optimization Algorithm for Solve of the Clustering Problem
Swarm Intelligence (SI) is an innovative artificial intelligence technique for solving complex optimization problems. Data clustering is the process of grouping data into a number of clusters. The goal of data clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to data from other clusters. Clustering algorithms have been applied to a ...
متن کاملStudy of Gene Expression Signatures for the Diagnosis of Pediatric Acute Lymphoblastic Leukemia (ALL) Through Gene Expression Array Analyses
Background: Acute lymphoblastic leukemia (ALL) as the most common malignancy in children is associated with high mortality and significant relapse. Currently, the non-invasive diagnosis of pediatric ALL is a main challenge in the early detection of patients. In the present study, a systems biology approach was used through network-based analysis to identify the key candidate genes related to AL...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Inf. Sci. Eng.
دوره 20 شماره
صفحات -
تاریخ انتشار 2004